Data: World University Rankings 2023¶
About: This 2023 dataset includes 13 performance indicator measures (variables) across four areas of teaching, research, knowledge transfer and international outlook for 1,799 universities globally. The dataset includes over 680,000 data points from Times Higher Education's survey submissions from 40,000 scholars, 121 million citations, and 15.5 million research publications at over 2,500 universities.
There are 2341 observations in this dataset.
| Variable | Variable Type | Description | |
|---|---|---|---|
| 1 | University rank | chr | Rank of specific university all over the world |
| 2 | University name | chr | Specific name of University |
| 3 | Location | chr | Physical place where university exists |
| 4 | No. of students | chr | Present number of students enrolled in university as of 2023 |
| 5 | No. of students per staff | dbl | Number of students under one Professor |
| 6 | International students | chr | Percentage of International Students |
| 7 | Female : male ratio | chr | A ratio of female to male students respectively |
| 8 | Overall score | chr | The combined weighted scores of those given below. Out of 100 |
| 9 | Teaching score | chr | The percieved prestige of the institution based on the Academic Reputation Survey. Out of 100. |
| 10 | Research score | chr | Reputation for research excellence amongst peers based on the Academic Reputation Survey. Out of 100 |
| 11 | Citations score | chr | The number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years. Out of 100. |
| 12 | Industry income score | chr | How much money a university receives from the working industry in exchange for its academic expertise. Out of 100 |
| 13 | International outlook score | chr | The ability of a university to attract undergraduates, postgraduates and faculty from all over the globe. |
Question¶
How do Female:Male Ratio and International Student % affect University Rank?
#install/load required packages
library(tidyverse)
library(broom)
library(infer)
library(base)
library(GGally)
rankings_df <- read_csv(url("https://raw.githubusercontent.com/sabrinalou/stat-301-project/main/World%20University%20Rankings%202023.csv"))
head(rankings_df)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.4 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.4.4 ✔ tibble 3.2.1 ✔ lubridate 1.9.3 ✔ tidyr 1.3.0 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Registered S3 method overwritten by 'GGally': method from +.gg ggplot2 Rows: 2341 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (11): University Rank, Name of University, Location, International Stude... dbl (1): No of student per staff num (1): No of student ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| University Rank | Name of University | Location | No of student | No of student per staff | International Student | Female:Male Ratio | OverAll Score | Teaching Score | Research Score | Citations Score | Industry Income Score | International Outlook Score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <chr> | <chr> | <dbl> | <dbl> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> |
| 1 | University of Oxford | United Kingdom | 20965 | 10.6 | 42% | 48 : 52 | 96.4 | 92.3 | 99.7 | 99.0 | 74.9 | 96.2 |
| 2 | Harvard University | United States | 21887 | 9.6 | 25% | 50 : 50 | 95.2 | 94.8 | 99.0 | 99.3 | 49.5 | 80.5 |
| 3 | University of Cambridge | United Kingdom | 20185 | 11.3 | 39% | 47 : 53 | 94.8 | 90.9 | 99.5 | 97.0 | 54.2 | 95.8 |
| 3 | Stanford University | United States | 16164 | 7.1 | 24% | 46 : 54 | 94.8 | 94.2 | 96.7 | 99.8 | 65.0 | 79.8 |
| 5 | Massachusetts Institute of Technology | United States | 11415 | 8.2 | 33% | 40 : 60 | 94.2 | 90.7 | 93.6 | 99.8 | 90.9 | 89.3 |
| 6 | California Institute of Technology | United States | 2237 | 6.2 | 34% | 37 : 63 | 94.1 | 90.9 | 97.0 | 97.3 | 89.8 | 83.6 |
Firstly, we need to convert all appropriate char columns to numerical so we are able to use them as continuous variable representations in visualizations. We should also rename them to have more code-readable and logical names. This includes University Rank, International Student (percentage), Female:Male Ratio, and the various scores.
All observations with "NA" values in University Rank should be removed, since these are critical variables to a visualizations. This leaves 199 universities rather than the original 2341.
# replacing spaces in column names with periods
names(rankings_df) <- gsub("\\s+", ".", names(rankings_df))
# changing columns to correct data types, renaming, creating new columns, and removing all NA ranking universities
rankings <- rankings_df |>
mutate(across(c(University.Rank, Teaching.Score, OverAll.Score, Research.Score, Citations.Score, Industry.Income.Score, International.Outlook.Score),
as.numeric)) |>
mutate(International.Student = as.numeric(gsub("%", "", International.Student)) / 100) |>
rename(International.Student.Percent = International.Student) |>
rename(Student.per.Staff = No.of.student.per.staff) |>
rename(Students = No.of.student) |>
rename(Overall.Score = OverAll.Score) |>
separate('Female:Male.Ratio', into = c("Female", "Male"), sep = " : ", convert = TRUE) |>
mutate('Female.Male.Ratio' = Female / Male) |>
select(-Female, -Male) |>
filter(!is.na(University.Rank))
head(rankings)
tail(rankings)
str(rankings)
Warning message: “There were 7 warnings in `mutate()`. The first warning was: ℹ In argument: `across(...)`. Caused by warning: ! NAs introduced by coercion ℹ Run `dplyr::last_dplyr_warnings()` to see the 6 remaining warnings.”
| University.Rank | Name.of.University | Location | Students | Student.per.Staff | International.Student.Percent | Overall.Score | Teaching.Score | Research.Score | Citations.Score | Industry.Income.Score | International.Outlook.Score | Female.Male.Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| 1 | University of Oxford | United Kingdom | 20965 | 10.6 | 0.42 | 96.4 | 92.3 | 99.7 | 99.0 | 74.9 | 96.2 | 0.9230769 |
| 2 | Harvard University | United States | 21887 | 9.6 | 0.25 | 95.2 | 94.8 | 99.0 | 99.3 | 49.5 | 80.5 | 1.0000000 |
| 3 | University of Cambridge | United Kingdom | 20185 | 11.3 | 0.39 | 94.8 | 90.9 | 99.5 | 97.0 | 54.2 | 95.8 | 0.8867925 |
| 3 | Stanford University | United States | 16164 | 7.1 | 0.24 | 94.8 | 94.2 | 96.7 | 99.8 | 65.0 | 79.8 | 0.8518519 |
| 5 | Massachusetts Institute of Technology | United States | 11415 | 8.2 | 0.33 | 94.2 | 90.7 | 93.6 | 99.8 | 90.9 | 89.3 | 0.6666667 |
| 6 | California Institute of Technology | United States | 2237 | 6.2 | 0.34 | 94.1 | 90.9 | 97.0 | 97.3 | 89.8 | 83.6 | 0.5873016 |
| University.Rank | Name.of.University | Location | Students | Student.per.Staff | International.Student.Percent | Overall.Score | Teaching.Score | Research.Score | Citations.Score | Industry.Income.Score | International.Outlook.Score | Female.Male.Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| 194 | University of Miami | United States | 17009 | 10.8 | 0.16 | 54.6 | 48.4 | 33.5 | 81.0 | 48.3 | 60.2 | 1.127660 |
| 196 | University of Erlangen-Nuremberg | Germany | 30303 | 43.4 | 0.13 | 54.5 | 44.6 | 47.5 | 68.8 | 90.7 | 53.5 | 1.040816 |
| 196 | Sichuan University | China | 49543 | 15.8 | 0.06 | 54.5 | 57.1 | 58.6 | 48.6 | 93.4 | 38.7 | NA |
| 198 | Durham University | United Kingdom | 18425 | 14.1 | 0.35 | 54.4 | 40.0 | 44.6 | 70.0 | 39.4 | 94.3 | 1.173913 |
| 198 | Queen’s University Belfast | NA | 19060 | 15.8 | 0.39 | 54.4 | 31.1 | 37.9 | 84.4 | 41.6 | 97.4 | 1.325581 |
| 198 | University of Reading | United Kingdom | 15720 | 16.4 | 0.32 | 54.4 | 36.5 | 39.6 | 78.5 | 42.2 | 93.3 | 1.272727 |
tibble [199 × 13] (S3: tbl_df/tbl/data.frame) $ University.Rank : num [1:199] 1 2 3 3 5 6 7 8 9 10 ... $ Name.of.University : chr [1:199] "University of Oxford" "Harvard University" "University of Cambridge" "Stanford University" ... $ Location : chr [1:199] "United Kingdom" "United States" "United Kingdom" "United States" ... $ Students : num [1:199] 20965 21887 20185 16164 11415 ... $ Student.per.Staff : num [1:199] 10.6 9.6 11.3 7.1 8.2 6.2 8 18.4 5.9 11.2 ... $ International.Student.Percent: num [1:199] 0.42 0.25 0.39 0.24 0.33 0.34 0.23 0.24 0.21 0.61 ... $ Overall.Score : num [1:199] 96.4 95.2 94.8 94.8 94.2 94.1 92.4 92.1 91.4 90.4 ... $ Teaching.Score : num [1:199] 92.3 94.8 90.9 94.2 90.7 90.9 87.6 86.4 92.6 82.8 ... $ Research.Score : num [1:199] 99.7 99 99.5 96.7 93.6 97 95.9 95.8 92.7 90.8 ... $ Citations.Score : num [1:199] 99 99.3 97 99.8 99.8 97.3 99.1 99 97 98.3 ... $ Industry.Income.Score : num [1:199] 74.9 49.5 54.2 65 90.9 89.8 66 76.8 55 59.8 ... $ International.Outlook.Score : num [1:199] 96.2 80.5 95.8 79.8 89.3 83.6 80.3 78.4 70.9 97.5 ... $ Female.Male.Ratio : num [1:199] 0.923 1 0.887 0.852 0.667 ...
Visualizations¶
# a scatterplot matrix to assess linear correlations between selected variables
rankings_matrix <- rankings |>
select(University.Rank, Student.per.Staff, International.Student.Percent, Overall.Score,
Teaching.Score, Research.Score, Citations.Score, Industry.Income.Score,
International.Outlook.Score, Female.Male.Ratio) |>
ggpairs() +
ggtitle("Figure 1. Scatterplot Matrix of 'Rankings' with Selected Variables")
rankings_matrix
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message: “Removed 26 rows containing non-finite values (`stat_density()`).”
From Figure 1, we can see that University Rank has strong negative correlations (|r| > 0.7) with Overall Score, Teaching Score, and Research Score, which is expected. Remember that this negative correlation means higher scores are associated with lower ranking, which are universities with "better" rankings. The next strongest correlations that aren't related to score are International Student Percentage, and Student per Staff.
For Female:Male Ratio, we can observe that it does not have a strong correlation with any of the variables. There are mostly negative correlations except for with Students per Staff, and Citations Score which are weak positive correlations. It has weak positive correlation with University Rank means "better" schools have lower female to male ratios.
Overall Score has strong positive correlations with Teaching and Research scores and a strong negative correlation with University Rank (as expected.) It also has positive correlations with Citations, Industry Income, and International Outlook scores with descending strength respectively. This is interesting, as we can examine which scores influence the overall score the most, and we can further explore what unexpected variables may influence each individual score category (ie. female:male ratio and international student percentage.) Overall Score has a weak negative corelation with Female:Male ratio.
International Student Percentage has a strong correlation with International Outlook (as expected.) The next strongest are with Overall Score, University Rank, and the other scores. Note that there is a negative correlation with Industry Income Score and Female:Male Ratio.
From this scatter plot matrix, we should proceed by visualizing some of the explanatory variables of interest with boxplots. Boxplots are able to easily visualize the distributions of continuous variables (ie. quartiles, median, mean) across categorical variables. Therefore, I will split up the University Rank variable into 4 equally split categories based on rank to see how the relationships may behave differently when the university rankings are broken up. This allows us to assess the relationships between Female:Male Ratio, University Rank, and International Student Percentage adequately.
Furthermore, I would like to explore each categorical score's relationship with non-score variables to identify any hidden relationships that aren't explained by the correlation of Overall Score with other variables.
# mean and median female:male ratio calculations
ratio_mean <- mean(rankings$Female.Male.Ratio, na.rm = TRUE)
ratio_median <- median(rankings$Female.Male.Ratio, na.rm = TRUE)
# mean and median intl student % calculations
intl_mean <- mean(rankings$International.Student.Percent, na.rm = TRUE)
intl_median <- median(rankings$International.Student.Percent, na.rm = TRUE)
# mean and median overall score calculations
overall_mean <- mean(rankings$Overall.Score, na.rm = TRUE)
overall_median <- median(rankings$Overall.Score, na.rm = TRUE)
# University.Rank column as a factor for boxplots
rank_categories <- cut(rankings$University.Rank,
breaks = c(1, 51, 101, 151, 201),
labels = c("1 to 50", "51 to 100", "101 to 150", "151 to 200"),
include.lowest = TRUE)
rankings_factored <- mutate(rankings, University.Rank = as.factor(rank_categories))
# ranking boxplots
# female:male ratio boxplot
ggplot(data = rankings_factored, aes(x = University.Rank, y = Female.Male.Ratio)) +
geom_boxplot() +
labs(x = "University Rank", y = "Female:Male Ratio") +
ggtitle("Figure 2A. Boxplot of Female:Male Ratio across Rank") +
geom_hline(yintercept = ratio_median, color = "red") +
geom_text(aes(x = 4,
y = ratio_median - 0.1,
label = "Median"),
color = "red", hjust = 0, vjust = -1, size = 3) +
theme(
text = element_text(size = 14),
plot.title = element_text(size = 12, face = "bold"),
axis.title = element_text(face = "bold")
)
# intl student % boxplot
ggplot(data = rankings_factored, aes(x = University.Rank, y = International.Student.Percent)) +
geom_boxplot() +
labs(x = "University Rank", y = "International Student %") +
ggtitle("Figure 2B. Boxplot of International Student % across Rank") +
geom_hline(yintercept = intl_median, color = "red") +
geom_text(aes(x = 4,
y = intl_median,
label = "Median"),
color = "red", hjust = 0, vjust = -1, size = 3) +
geom_hline(yintercept = intl_mean, color = "blue") +
geom_text(aes(x = 1,
y = intl_mean - 0.03,
label = "Mean"),
color = "blue", hjust = 0, vjust = -1, size = 3) +
theme(
text = element_text(size = 14),
plot.title = element_text(size = 12, face = "bold"),
axis.title = element_text(face = "bold")
)
Warning message: “Removed 26 rows containing non-finite values (`stat_boxplot()`).”
From the series of Figure 2 boxplots, we notice that Female:Male Ratio fluctuates across the University Ranking groups, addressing a relationship to be further explored that is potentially more complex than a linear relationship. With International Student Percentage and Overall Score, there were steady distributions that aligned with the correlation values we saw in Figure 1.
Score Exploration¶
The specific scatterplot matrices of the score categories below explore their correlations with our variables of interest that contribute to the Overall Score. Exploration of these relationships may reveal underlying effects from our variables of interest and Overall Score, and thus University Rank of universities.
score_columns <- grep(".Score", names(rankings), value = TRUE)
scores_matrix <- function(df, x_var, y_vars) {
title <- paste("Scatterplot Matrix for", x_var)
ggpairs(df, columns = c(x_var, y_vars),
lower = list(continuous = "points"),
diag = list(continuous = "blank")) +
ggtitle(title)
}
scores_matrices <- lapply(score_columns, function(score_col) {
scores_matrix(rankings, score_col, c("Female.Male.Ratio", "International.Student.Percent"))
})
# Display the scatterplot matrices and figure title
print("Figure 3. Scatterplot Matrices for Score Categories vs Female:Male Ratio and International Student %")
scores_matrices
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
[1] "Figure 3. Scatterplot Matrices for Score Categories vs Female:Male Ratio and International Student %"
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).” Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, : “Removed 26 rows containing missing values” Warning message: “Removed 26 rows containing missing values (`geom_point()`).”
[[1]] [[2]] [[3]] [[4]] [[5]] [[6]]
Assignment 3: Methods and Plans¶
This section proposes a method to address the research question: How do Female:Male Ratio and International Student % affect University Rank? We will explore this by examining the relationships between Female:Male Ratio, International Student %, the categorical score variables, Overall Score, and University Rank for the first 199 institutions in the dataset.
Proposed Method: Multiple Linear Regression with Interaction Term¶
Multiple linear regression (MLR) will be a good method for the analyses we must do. MLR is appropriate for this study because:
- It allows examination of the relationship between multiple independent variables (Female:Male Ratio, International Student %, Teaching Score, Citation Score, Research Score, Industry Income Score, International Outlook Score, and Overall Score) and a single dependent variable, University Rank.
- Modeling Continuous Outcome: The categorical scores, ratio, and percentages are continuous variables. Linear regression is able to accurately model the relationship between continuous independent variables and a continuous dependent variable usually.
- Interaction between explanatory variables: Including interaction terms in the model allow us to analyze the relationships between Female:Male Ratio, International Student % and the categorical score variables. A statistically significant interaction term would indicate establish evidence of the effect of Female:Male Ratio and International Student % on how the Overall Score and thus University Rankings are obtained.
Limitations of Linear Regression:¶
The model's predictions operate on the assumption that the relationship between the variables is linear, when in reality they might not be.
Outliers: Outliers in the dataset can significantly impact the results of linear regression and lead to inaccurate results.
Multicollinearity: If the independent variables are highly correlated, it can lead to unstable coefficient estimates. We can check for multicollinearity using correlation analysis and variance inflation factors (VIF).
print("Table 1A. University Rank Additive Model with Categorical Scores as Covariates")
rankings_mlr <- lm(University.Rank ~ Teaching.Score + Research.Score + Citations.Score + Industry.Income.Score + International.Outlook.Score, data = rankings) |>
tidy(0.95) |>
mutate_if(is.numeric, round, 2)
rankings_mlr
print("Table 1B. University Rank Interaction Model with Major Categorical Scores, Female:Male Ratio, and International Student % as Covariates")
rankings_mlr_int <- lm(University.Rank ~ Teaching.Score*Research.Score*Citations.Score*Female.Male.Ratio*International.Student.Percent, data = rankings) |>
tidy(0.95) |>
mutate_if(is.numeric, round, 2)
head(rankings_mlr_int, n = 16)
[1] "Table 1A. University Rank Additive Model with Categorical Scores as Covariates"
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| (Intercept) | 422.99 | 15.12 | 27.97 | 0.00 | 393.16 | 452.82 |
| Teaching.Score | -0.93 | 0.24 | -3.84 | 0.00 | -1.40 | -0.45 |
| Research.Score | -1.93 | 0.22 | -8.63 | 0.00 | -2.37 | -1.49 |
| Citations.Score | -1.39 | 0.15 | -9.08 | 0.00 | -1.69 | -1.09 |
| Industry.Income.Score | -0.18 | 0.09 | -2.15 | 0.03 | -0.35 | -0.02 |
| International.Outlook.Score | -0.40 | 0.10 | -4.15 | 0.00 | -0.58 | -0.21 |
[1] "Table 1B. University Rank Interaction Model with Major Categorical Scores, Female:Male Ratio, and International Student % as Covariates"
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
| (Intercept) | 3189.71 | 1341.18 | 2.38 | 0.02 | 538.29 | 5841.12 |
| Teaching.Score | -50.81 | 23.20 | -2.19 | 0.03 | -96.68 | -4.94 |
| Research.Score | -26.79 | 21.49 | -1.25 | 0.21 | -69.28 | 15.69 |
| Citations.Score | -30.72 | 15.07 | -2.04 | 0.04 | -60.52 | -0.93 |
| Female.Male.Ratio | -1701.80 | 1263.26 | -1.35 | 0.18 | -4199.17 | 795.57 |
| International.Student.Percent | -9494.44 | 5313.29 | -1.79 | 0.08 | -19998.45 | 1009.57 |
| Teaching.Score:Research.Score | 0.49 | 0.33 | 1.50 | 0.14 | -0.16 | 1.13 |
| Teaching.Score:Citations.Score | 0.55 | 0.27 | 2.07 | 0.04 | 0.02 | 1.08 |
| Research.Score:Citations.Score | 0.22 | 0.24 | 0.93 | 0.35 | -0.25 | 0.70 |
| Teaching.Score:Female.Male.Ratio | 35.71 | 21.71 | 1.64 | 0.10 | -7.21 | 78.64 |
| Research.Score:Female.Male.Ratio | 14.67 | 20.99 | 0.70 | 0.49 | -26.83 | 56.17 |
| Citations.Score:Female.Male.Ratio | 19.86 | 14.14 | 1.41 | 0.16 | -8.08 | 47.81 |
| Teaching.Score:International.Student.Percent | 168.79 | 91.13 | 1.85 | 0.07 | -11.38 | 348.96 |
| Research.Score:International.Student.Percent | 86.02 | 84.32 | 1.02 | 0.31 | -80.67 | 252.72 |
| Citations.Score:International.Student.Percent | 104.04 | 59.38 | 1.75 | 0.08 | -13.35 | 221.43 |
| Female.Male.Ratio:International.Student.Percent | 9382.35 | 4781.72 | 1.96 | 0.05 | -70.78 | 18835.48 |
In Table 1A, it is shown that the Teaching, Research, and Citations Scores have significant p-values < 0.05, where their increases are associated with a decrease in the response variable (ie. a higher ranking). Due to their relatively larger effects on University Ranking, they are chosen as the variables to include in the interaction model. In Table 1B, it is observed that neither Female:Male Ratio or International Student % have any statistically significant interaction effects with Teaching, Research, or Citations Scores. However, there is a significant (p=0.05) interaction between Female:Male Ratio and International Student %. This means that as the Female:Male Ratio increases, the positive effect of International Student % on University Rank becomes stronger.